A Language-Independent Approach to European Text Retrieval

نویسندگان

Paul McNamee

James Mayfield

چکیده

We present an approach to multilingual information retrieval that does not depend on the existence of specific linguistic resources such as stemmers or thesaurii. Using the HAIRCUT system we participated in the monolingual, bilingual, and multilingual tasks of the CLEF-2000 evaluation. Our method, based on combining the benefits of words and character n-grams, was effective for both language-independent monolingual retrieval as well as for cross-language retrieval with translated queries. After describing our monolingual retrieval approach we compare a translation method using aligned parallel corpora to commercial machine translation software.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language-Dependent and Language-Independent Approaches to Cross-Lingual Text Retrieval

We investigates the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding, and constrast them with language-independent approaches, such as character n-gramming. In order to reap the benefits of more than one type of approach, we also consider the effectiveness of the combination of both types of approaches. We focus on document retrieval in ni...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Using Eurowordnet in a Concept-Based Approach to Cross-Language Text Retrieval

We present an approach to cross± language text retrieval based on the EuroWordNet (EWN)multilingual semantic database. EuroWordNet is a multilingual,WordNet ± like database with basic semantic relations between words for several European languages (English, Dutch, Spanish, Italian, German, French, Czech, and Estonian). In addition to the relations in WordNet 1.5, EWN includes domain labels, cro...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

متن کامل

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

A Language-Independent Approach to European Text Retrieval

نویسندگان

چکیده

منابع مشابه

Language-Dependent and Language-Independent Approaches to Cross-Lingual Text Retrieval

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Using Eurowordnet in a Concept-Based Approach to Cross-Language Text Retrieval

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

عنوان ژورنال:

اشتراک گذاری